Cross-corpora spoken language identification with domain diversification and generalization

نویسندگان

چکیده

This work addresses the cross-corpora generalization issue for low-resourced spoken language identification (LID) problem. We have conducted experiments in context of Indian LID and identified strikingly poor due to corpora-dependent non-lingual biases. Our contribution this is twofold. First, we propose domain diversification, which diversifies limited training data using different audio augmentation methods. then concept maximally diversity-aware cascaded augmentations optimize fold-factor effective diversification data. Second, introduce idea considering methods as pseudo-domains. Towards this, investigate both domain-invariant domain-aware approaches. system based on state-of-the-art emphasized channel attention, propagation, aggregation time delay neural network (ECAPA-TDNN) architecture. extensive with three widely used corpora research. In addition, conduct a final blind evaluation our proposed subset VoxLingua107 corpus collected wild. demonstrate that more promising over commonly simple The study also reveals solution than diversification. notice learning performs better same-corpora LID, whereas suitable generalization. Compared basic ECAPA-TDNN, its extensions improve EER up 5.23%. contrast, performance test scenarios.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Word clustering with parallel spoken language corpora

In this paper we introduce a word clustering algorithm which uses a bilingual, parallel corpus to group together words in the source and target language. Our method generalizes previous mutual information clustering algorithms for monolingual data by incorporating a statistical translation model. Preliminary experiments have shown that the algorithm can e ectively employ the constraints implici...

متن کامل

Advanced Distribution Means for Spoken Language Corpora

This report outlines the distribution of Spoken Language Corpora on traditional CD-ROM media and a new approach via network. High capacity CD-ROMs are being introduced, but this is only a marginal improvement in respect to the distribution of SLC. Network access however offers many opportunities: customized SLC, on-line access, and a high degree of protection. However, for network access to be ...

متن کامل

Multi-level annotation for spoken language corpora

The constitution of multi-level databases integrating, for example, both prosodic and morphosyntactic levels of representation presents a number of problems, some specific to the individual domains, and others concerning the integration of the two domains. It is argued that the formalism of annotation graphs provides an adequate solution to these problems, which can be implemented in an XML rep...

متن کامل

Concordancing for parallel spoken language corpora

Concordancing is one of the oldest corpus analysis tools, especially for written corpora. In NLP concordancing appears in training of speech-recognition system. Additionally, comparative studies of different languages result in parallel corpora. Concordancing for these corpora in a NLP context is a new approach. We propose to combine these fields of interest for a multi-purpose concordance for ...

متن کامل

Detecting Annotation Errors in Spoken Language Corpora

Consistency of corpus annotation is an essential property for the many uses of annotated corpora in computational and theoretical linguistics. While some research addresses the detection of inconsistencies in part-of-speech and other positional annotation (van Halteren, 2000; Eskin, 2000; Dickinson and Meurers, 2003a), more recently work has also started to address errors in syntactic and other...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Computer Speech & Language

سال: 2023

ISSN: ['1095-8363', '0885-2308']

DOI: https://doi.org/10.1016/j.csl.2023.101489